A Method for Large-Scale 1-Regularized Logistic Regression
نویسندگان
چکیده
Logistic regression with 1 regularization has been proposed as a promising method for feature selection in classification problems. Several specialized solution methods have been proposed for 1-regularized logistic regression problems (LRPs). However, existing methods do not scale well to large problems that arise in many practical settings. In this paper we describe an efficient interior-point method for solving 1-regularized LRPs. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC. A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve large sparse problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few tens of minutes, on a PC. Numerical experiments show that our method outperforms standard methods for solving convex optimization problems as well as other methods specifically designed for 1regularized LRPs. Introduction Logistic regression Let x ∈ R denote a vector of feature variables, and b ∈ {−1,+1} denote the associated binary output. In the logistic model, the conditional probability of b, given x, has the form Prob(b|x) = 1/(1 + exp ( −b(wx+ v) ) ). The parameters of this model are v ∈ R (the intercept) and w ∈ R (the weight vector). Suppose we are given a set of training or observed examples, (xi, bi) ∈ R × {−1,+1}, i = 1, . . . ,m, assumed to be independent samples from a distribution. The model parameters w and v can be found by maximum likelihood estimation from the observed examples. The maximum likelihood estimate minimizes the average loss lavg(v, w) = (1/m) m ∑
منابع مشابه
An Efficient Method for Large-Scale l1-Regularized Convex Loss Minimization
Convex loss minimization with l1 regularization has been proposed as a promising method for feature selection in classification (e.g., l1-regularized logistic regression) and regression (e.g., l1-regularized least squares). In this paper we describe an efficient interior-point method for solving large-scale l1-regularized convex loss minimization problems that uses a preconditioned conjugate gr...
متن کاملA Method for Large-Scale l1-Regularized Logistic Regression
Logistic regression with l1 regularization has been proposed as a promising method for feature selection in classification problems. Several specialized solution methods have been proposed for l1-regularized logistic regression problems (LRPs). However, existing methods do not scale well to large problems that arise in many practical settings. In this paper we describe an efficient interior-poi...
متن کاملA coordinate gradient descent method for ℓ1-regularized convex minimization
In applications such as signal processing and statistics, many problems involve finding sparse solutions to under-determined linear systems of equations. These problems can be formulated as a structured nonsmooth optimization problems, i.e., the problem of minimizing `1-regularized linear least squares problems. In this paper, we propose a block coordinate gradient descent method (abbreviated a...
متن کاملDistributed Newton Method for Regularized Logistic Regression
Regularized logistic regression is a very successful classification method, but for large-scale data, its distributed training has not been investigated much. In this work, we propose a distributed Newton method for training logistic regression. Many interesting techniques are discussed for reducing the communication cost. Experiments show that the proposed method is faster than state of the ar...
متن کاملAn Interior-Point Method for Large-Scale l1-Regularized Logistic Regression
Logistic regression with `1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale `1-regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of...
متن کامل